Refactor json_serialization_test.py to use schema-based hierarchical tests#338
Refactor json_serialization_test.py to use schema-based hierarchical tests#338rwxayheee wants to merge 17 commits intoforlilab:developfrom
json_serialization_test.py to use schema-based hierarchical tests#338Conversation
|
Below is a little more technical details for future reference: Design explainedThis PR rewrites the tests using a schema-first principle to exhaustively validate (i) the serialization and deserialization process of JSON parsable classes, and (ii) the deep equality of original and decoded instances, based on the internal structure of the original object. The main components in the updated tests are:
# iterate over nested classes in the Polymer hierarchy
@pytest.mark.parametrize("cls", [
Polymer,
Monomer,
RDKitMoleculeSetup,
])
# check for seralization/deserialization and deep equality
def test_json_roundtrip(cls, populated_polymer):
"""Tests starting from a populated polymer object"""
for obj in subobject_factory(cls, populated_polymer):
json_str = obj.to_json()
decoded = cls.from_json(json_str)
assert isinstance(decoded, cls)
deep_assert_equal(decoded, obj)
|
Bugs discoveredRunning the tests points out some minor issues: These will be fixed in additional, separate commits to address each issue. |
because roundtrip MolToSmarts-MolFromSmarts not preserving all atom specs for class ResiduePadder attribute adjacent_smartsmol
…msd_symmetry_indices in decoding logic
Issues and solutions(1) The This problem was discussed in rdkit/rdkit#3193. Here it occurred because we are serializing/deserializing SMARTS via a molecule attribute: class from rdkit import Chem
smarts_in = "[P:11]([O-:12])(=[O:13])[O:14][C:15]"
mol_1 = Chem.MolFromSmarts(smarts_in)
smarts_2 = Chem.MolToSmarts(mol_1)
print(f"Original SMARTS: {smarts_in}")
print(f"Converted SMARTS: {smarts_2}")
mol_2 = Chem.MolFromSmarts(smarts_2)
# Compare charges
for a1, a2 in zip(mol_1.GetAtoms(), mol_2.GetAtoms()):
print(f"Atom {a1.GetSymbol()} — original: {a1.GetFormalCharge()}, decoded: {a2.GetFormalCharge()}")Output: (2) Small type fix for class RDKitMoleculeSetup attribute rmsd_symmetry_indices in encoding logic (63fb250) (3) Some bug fixes within class (4) Remove unused attribute (5) Similar to (2), restores tuple (converting list to tuple in decoding logic) in nested lists under class |
|
(970b1e9) are improvements to attribute detection and comparison in the deep equality test. Now, the tests are able to capture the following problems: What if...?(1) What if we add an attribute in the For example, as discussed in conversation #306 (comment), adding _bonds = {}
for key, bond_list in bonds.items():
monomer1 = self.monomers[key[0]]
monomer2 = self.monomers[key[1]]
if monomer1.rdkit_mol is None or monomer2.rdkit_mol is None:
continue
invmap1 = {j: i for i, j in monomer1.mapidx_to_raw.items()}
invmap2 = {j: i for i, j in monomer2.mapidx_to_raw.items()}
_bonds[key] = [(invmap1[b[0]], invmap2[b[1]]) for b in bond_list]
bonds = _bonds
self.bonds = bondsExpected pytest output (2) What if the original instance for some reason got additional attribute that wasn't passed to the decoded object? Example output (3) What if the decoded instance for some reason got additional attribute that wasn't present in original object? Example output |
|
The updated test script currently includes four separable (mutually independent) test groups in Developer Usage: Steps to write a new, independent hierarchical testHere's a practical example: step 1: Import the necessary data and create a fixture as needed pkgdir = pathlib.Path(meeko.__file__).parents[1]
ahhy_v061_json = pkgdir / "test/polymer_data/AHHY-v0.6.1.json"
@pytest.fixture
def populated_polymer_v061():
"""fixture for a populated polymer object, from a v0.6.1 JSON file"""
with open(ahhy_v061_json) as f:
json_str = f.read()
polymer = Polymer.from_json(json_str)
if len(polymer.monomers) == 0:
raise ValueError("Polymer creation failed")
return polymerstep 2: Define an object factory def lazy_factory(cls, root):
"""Factory function to create a single object based on the class and root object."""
if isinstance(root, Polymer):
if cls is Polymer:
return root
monomers = list(root.monomers.values())
if not monomers:
raise ValueError("No monomers found in Polymer")
last_monomer = monomers[-1]
if cls is Monomer:
return last_monomer
if cls is RDKitMoleculeSetup:
return last_monomer.molsetup
raise ValueError(f"Unexpected class or root: {cls}, {type(root)}")step 3: Define the class to be iterated and include nested JSON-parsable classes, if any @pytest.mark.parametrize("cls", [
Polymer,
Monomer,
RDKitMoleculeSetup,
])step 4: Write the def test_json_roundtrip_lazy(cls, populated_polymer_v061):
"""Lazy tests starting from a populated polymer object"""
obj = lazy_factory(cls, populated_polymer_v061)
if obj is None:
warnings.warn(f"Subobject of type {cls.__name__} is None — skipping.", stacklevel=1)
pass
json_str = obj.to_json()
decoded = cls.from_json(json_str)
assert isinstance(decoded, cls)
deep_assert_equal(decoded, obj)The fixture (test case), object factory (scope) & classes to be iterated (depth of testing), and the testing logics are all flexible. |
(1) incorrect var name in padder parsing from data (2) scope issue of default data_path
multiple polymer fixtures; let pass when original obj is None; remove functionally redundant test
|
(Just realized that (working on two small fixes before finalizing.) |
|
The test script is finalized, but the tests are not passing without exceptions ( |
|
This should be ready. Overall, this PR has a rewritten version of the JSON roundtrip tests, and some minor fixes. I will write some more tests for residue templates in another PR. |
This is a rewrite of
json_serialization_test.pyto achieve the three major objectives:Code efficiency
This PR follows the hierarchical spirit of the original JSON tests, in a schema-first way with minimal redundant codes. Hopefully this is easier for long-term maintainance and automated test coverage assessment.
Ensure exhaustive attribute checking
This PR exhaustively checks all attributes of all JSON-parsable classes, without hardcoding the list of attributes to check. As requested in added function stitch from PR188, with options
residues_to_addandbonds_to_use#306 (comment) by @diogomart, the JSON-interchange logics of new attributes must be checked. This PR demands all attributes to be evaluated in original and decoded object, unless explicitly ignored. Otherwise, pytest will throw errors.Deep equality check and bug fixes
This PR implements a deep recursive equality check, instead of hardcoding how each attribute should be evaluated. Because of this, some bugs are exposed.
(Bug fixes will be added in separate commits, in case we want to fix actual problems without major changes of the tests)